Dosovitskiy A, Brox T. Inverting Visual Representations with Convolutional Networks[C]// IEEE Conference on Computer Vision & Pattern Recognition. 2016.
1. Overview
In this paper, it proposes a new approach to study image representation by inverting them with an up-convolutional neural network
- apply to shallow representation. HOG, SIFT, LBP
- apply to deep representation. DNN
1.1. Conclusion
1.1.1. Representation of AlexNet
- Features from all layers of the network preserve the precise colors and the rough position of objects in the image
- In higher layers, almost all information about the input image is contained in the pattern of non-zero activations, not their precise values
- In the layer FC8, most information about the input image is contained in small probabilities of those classes that are not in top-5 network predictions
1.2. Related Work
- Local Binary Pattern (LBP). LBP features are not differentiable
- SIFT. keypoint-based representation
1.2.1. Existing Method Based on Gradient Descent
- invert a differentiable image representation phi using gradient descent, so it can not be applied to LBP
- optimize the difference between the feature vectors, not the image reconstruction error
- involve optimization at test time
1.3. Methods
1.3.1. Loss Function
- phi. feature vector
- w. parameters of CNN
1.3.2. Network
1.3.3. HOG and LBP
- HOG feature. W/8 x H/8 x 31
- LBP feature. W/16 x H/16 x 58
- continue process by CNN untill feature size is 64 times smaller than input
1.3.4. Sparse SIFT
- N keypoints.
- each keypoint contains: coordinate (x, y), scale s, orientation α, feature descriptor f
- split image into cells of size d x d, yhis yields W/d x H/d
- feature. W/d x H/d x (D+5)